Lukas Lehner and Maximilian Trenkmann
with thanks to Julia Schulte-Cloos
2023-01-12
Automatable reports
Version control
Dissemination and academic websites
Containerisation for reproducible environments
Encryption and advanced programming
🙌 Benefits yourself! 🙌
‘Create a better relationship with your future self’
🚀 That’s the future of the social sciences! 🚀
Journals and funding agency requirements, see e.g. the Sherpa Romeo Database or the Plan S Journal Checker Tool
Open science and data sharing
Confidence in your own work, easier collaboration and smoother workflows
Replicability refers to situations in which a researcher obtains new data to reach the same scientific conclusions as a previous study, whereas reproducibility refers to situations in which the original researcher’s software, code, and data are used to regenerate the results.
✅ Replication standards: guidelines, protocols, and software designed to help researchers share, analyze, archive, preserve, distribute, catalog, translate, verify, and replicate scholarly research data and analyses across disciplines. Includes proposals to improve the norms around data sharing and replication in scientific research.
Science is not built upon blind trust, but on verifiability. Science as “organized skepticism” (Merton, 1947). Only when raw data and other research material is shared such organized skepticism can be implemented, and science can self-correct. One aspect of good scientific practice is Open Data.
Reliable infrastructure for storage and publication (e.g., subject-specific repositories, institutional repositories)
Plan S principle: “from 2021, scientific publications that result from research funded by public grants must be published in compliant Open Access journals or platforms.” (Sherpa Romeo database; fairsharing.org)
Integrate computer code with software documentation in a single document
read.csv('./data/foo.csv')sessionInfo()figs, data, etc.)- ./data
+ `raw_data.csv`
+ `tidy_data.csv`
+ `codebook.txt`
- ./analysis
- ./figures
+ ./interaction_plot.png
+ ./bar_plot.png
- ./paper
- ./presentation
- ./README.md
snake_casecamelCaserm -rf /cd ..\documentscd C:\ProgramDatarm file.mdrm -r ./directorymv ./oldfilename.md ./newfilename.mdcat file.mdls>>, &&, ;, |CTRL+C & CTRL+V depends on enviromentCTRL+CCTRL+C & `CTRL+VCTRL+Shift+C+CTRL+Shift+V`CTRL+C Kill a processCTRL+Z Pause a processMarkdown as a human readable way to style text
“Markdown is a text-to-HTML conversion tool for web writers. Markdown allows you to write using an easy-to-read, easy-to-write plain text format, then convert it to structurally valid XHTML (or HTML).” John Gruber, founder of Markdown
R and RStudio (not the single IDE that supports RMarkdown, Visual Studio is also a great choice)
RMarkdown integrates R code into Markdown language through knitr
Quarto: extension of RMarkdown, optimised for language interoperability & CLI
**bold text** or equivalently __bold__*italic* or equivalently _italic_# A level-one section
## A level-two section with a [link](/url)
# An unnumbered section {-}, or equivalently # An unnumbered section {.unnumbered}
. . .
{#sec:introduction}# Reproducible research outputs → {#reproducible-research-outputs}.Bullet list
Numbered lists
^[footnote]…mostly a matter of taste 🍷🍺
Control how code and its products appear in your compiled report or manuscript. Code chunks are required to have unique names, e.g. {r data2017-tidy}
. . .
Define conditions under which the code is evaluated and how its output is processed within the document. Most frequent options include: eval, include, results, echo. Comprehensive list online, in the RMarkdown reference guide, and for Quarto. Most IDEs allow you to easily switch between different chunks.
→ old-school way to specify chunk options
```{r elephant-chunk-1, out.width="20%", fig.align="center", fig.cap="Elephant in the room", echo="fenced"}
knitr::include_graphics(path = "figs/elephant.jpg")
```→ more recently, chunk options can be specified as comments within the actual code chunks to increase readability
The slope of the regression is 3.93
output, title, author, date---
title: "Writing a reproducible research paper"
author: "Julia Schulte-Cloos"
date: 2023-12-01
output:
bookdown::pdf_document2:
keep_tex: yes
number_sections: false
toc: false
documentclass: scrartcl
---In doubt about YAML validity? Use an available YAML linter.
You can render your document by relying on globally specified parameter (YAML header) that will affect how your code is evaluated, e.g. by focussing only on a subset of your data.
---
title: "My Document"
params:
alpha: 0.1
ratio: 0.1
---
Turn your Rmd file into a Quarto document (…or the other way around!) What are some of the differences?
15:00
markdown and knitrTable of contents
Paragraphs and indentation - Pandoc option indent: true in the YAML header
Page margins and spacing - geometry option in the YAML header
Include your literature.bib file in the YAML header (YAML key: bibliography:) Cite any entry as recorded in the .bib-file by calling @palmerdata.2020 for inline citations and [@palmerdata.2020, p.10] for all other references.
If the document specifies a csl style, Pandoc will convert Markdown references, i.e., @palmerdata.2020, to ‘hardcoded’ text and a hyperlink to the reference section in your document.
If your document specifies a citation reference package like biblatex or natbib along with the related options, pandoc will create the corresponding LaTeX commands (e.g. \autocite, or \pcite) to create the references from our Markdown references (not recommended because you are not flexible regarding output formats!)
You can cross-reference sections, figures, tables or equations (e.g., \@ref(fig:elephant)).
Cross-referencing is possible in all output formats that are part of the bookdown package (e.g. bookdown::pdf_document2). You can reference a figure by \@ref(fig:elephant) where ‘elephant’ is the name of the code chunk that produces the figure.
Quarto uses a slightly different syntax: @fig-elephant
If you specify the colorlinks: true option in the YAML header, the hyperlinks to the respective figure will be colored.
If you do not specify a section label, Pandoc will automatically assign a label based on the title of your header. For more details, see the Pandoc manual. If you wish to add a manual label to a header, add {#mylabel} to the end of the section header.
### References
::: {#refs}
:::
section-refs Lua filterAdvantages? 🤔
Approach 1
eval=knitr::is_html_output()::: {.not-in-format .latex}Approach 2
execute YAML option in a .qmd documentexecute options (no indentation, for any type of format)format specific option (indentation, specific for each format)---
format:
html:
toc: true
code-fold: true
execute:
echo: true
pdf:
toc: false
execute:
echo: false
execute:
warning: false
message: false
---
The code chunk option ref.label takes a vector of chunk labels to retrieve the content of the respective chunks.
ref.label can also evaluate R code, e.g. to retrieve the code of all labels within a document (knitr::all_labels()).
# Appendix: All code for this presentation
``{r}
#| ref.label: knitr::all_labels()
#| echo=TRUE
#| eval=FALSE}
``
…or a subset of chunks that are also evaluated when rendering the document:
labs <- knitr::all_labels(eval == TRUE)
``{r}
#| ref.label: labs
``
knitr engine, jupyter can be used---
title: "My Document"
jupyter: python3
---
execute)execute key can replace a chunk that globally sets knitr options (as in the RMarkdown framework)
---
title: "All code chunks in this document are not printed by default"
execute:
echo: false
---
Quarto offers more control regarding the inclusion of author-related meta-data (names, affiliations, contributions to the work) that is printed as part of the title, in some output formats. See the full documentation
author:
- name:
given: Norah
family: Jones
literal: Norah Jones
attributes:
corresponding: true
You can also include the attribute {.appendix} after any header (at any point of your document) to delegate the respective section to the appendix.
⚠️ Quarto uses a slightly different syntax to cross-reference figures: @fig-elephant
Quarto allows you to add a dedicated code chunk option #| layout-ncol: 2 to your code chunks to include several figures side by side.
This is very powerful in conjunction with #| fig-subcap: which allows you to specify captions for each of the figure.
#| label: fig-graphsidebyside
#| fig-subcap: ["Caption of left figure","Caption of right figure"]
modelsummary)Integrate two tables side-by-side, each with its own sub-caption (hint: Quarto makes it quite easy to solve this task)
20:00
Tools for Efficient Workflows